r^ v{x) — max I min I y. I^Piv I ^> '^j ^) ''^(y) + ^(^; 'Jj ^) I I Va; G A". (1) 
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Abstract 

T— ( We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect 

T— ( information, which combines policy iteration and algebraic multigrid methods. This algorithm 

^^ can be applied either to a true finite state space zero-sum two player game or to the discretiza- 

Cn tion of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations 

^ or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, 

Q which allows to improve substantially the computation time for solving some variational in- 

^T' equalities. 

(N 

C^I 1 Introduction 

r \ In the present paper, we are interested in solving non-linear finite dimensional equations of the 

j^ form : 

■3 

^ with unknown the function u : A" ^- R, where X := {l,...,n}. Here A := {l,...,mi}, B := 

'~^ {1, . . . , 7712} are finite sets; the functions (a;, a, 6) G X x A x B ^ r{x, a, 6) G R and (x, a, b, y) G 

—J X X Ax B X X -^ p{y\x, a, h) G R+ are given such that X]«eAr P{y\^^ ^i ^) — 1; ^'^^ < ^ < 1 is a 

^ given constant. These equations appear when solving the following particular dynamic games. 

CO An infinitely repeated game, or discrete time dynamic game, or infinite horizon multi-stage 

•^ game, consists in an infinite sequence of state transitions, where at each step, the transition depends 

^*-^ on the actions of the players, and each player receives a reward which depends on the state of the 

game and the actions of all players at this step. The aim of each player is to maximize his own 

r objective function, for instance his payoff which is the sum of the rewards he received at all steps. 

^"^ The game is stochastic when the state sequence is a random process with a Markov property, then 

I the objective function is the expected payoff. It is a two player zero-sum game when there are two 

• • players with opposite rewards, hence player 2 aims to minimize player 1 objective function. When 

. 5^ the game does not stop in finite time (almost surely) , one often consider a discounted payoff where 

S^ the reward at each step k is discounted by some multiplicative factor /i*^, with < /j, < 1. 

5-H Consider in particular a two player zero-sum discounted stochastic game with finite state space 

X and action spaces A and B for player 1 and player 2 respectively. Denote by r(a;, a, b) the reward 

of player 1 when (at the current step) the state is a; G A" and the actions of player 1 and 2 are 

a G „4, 6 G S respectively. Denote by p{y\x,a^b) the transition probability from state x to state 

y when the actions of player 1 and 2 are a G ^, 6 G S respectively. Assume that player 1 plays 

before player 2, and that at each step, player 1 is choosing his action a G .4 as a function of the 

current state x G A, and player 2 is choosing his action 6 G S as a function of the current state 

a; G A" and action a G ^ of player 1. Assume each player is maximizing his own objective function. 

Under the previous finiteness conditions on (A", A^ B), there exists a function u : A" — > R which 

associates to each a; G A the expected payoff v{x) of player 1 when the initial state of the game 

is X. This function is called the value or the value function of the game. It is the unique solution 

of Equation (IT]) [54^ , called itself the dynamic programming equation or Shapley equation of the 

game. Solving Equation M is important since it also gives the optimal stationary strategies of 

the game, see section [2] for precise definitions of strategies and details. Discrete time zero-sums 



stochastic games arise in several domains of applications, such as military operations [43] , network 
flow control |4], pursuit-evasion problems (although often studied in the deterministic case), see 
[46| for other applications references. 

Equations of the form (111) can also be obtained as special discretizations of partial differential 
equations associated to differential stochastic games, where the state space X is now a subset of R'^ 
(see sections] for details). For instance the following non-linear elliptic partial differential equation 
called Isaacs equation : 



d 



d'^v{x) Y^ dv{x) 



^#1 W^^ I >^^g.,(a,6,a:)^^^+^g,(a,6,x)^^-A.(.)-fr(x,a,6) I I =0 Nx^X 

(2) 
allows one to solve a differential game in the same way as (fTl) solves a discrete time dynamic game. 
Here A, B are either finite sets or subsets of some R^ spaces, A > is a scalar, and (a;, a, fe) e A" x 
^xi3 — )■ g(x, a, &) = {qij{x,a,b))ij=i,,,,^d & Sj", the set of positive definite symmetric dxd matrices, 
{x,a,b) e XxAxB ^ g{x,a,b) = {g.j{x,a,b))j=i^,„^d € E,'^, and (x, a, 6) e XxAxB^i- r{x,a,b) e 
II are given functions. Such equations may be applied in particular to pursuit-evasion games (see for 
instance [3]), but they also appear in solving H°° optimal control problems (see for instance |Sj), 
or risk-sensitive optimal control problems 32J, in particular for finance applications [25] . The 
discretization of Equation (pi) with a monotone scheme in the sense of [8] yields an equation of 
the form (fl]) which can then be interpreted as the dynamic programming equation of a stochastic 
game with discrete time and finite state space. Suitable possible discretizations schemes are for 
instance : Markov chain discretizations [Ml H^ , monotone discretizations [5j , full discretizations 
of semi-Lagrangian type [B], and max-plus finite element method for deterministic games or 
control problems. Hence, we are interested in solving discretizations of Equation (pi) which have 
the form of Equation (II]), in order to find an approximation of the value of the corresponding 
differential stochastic game. 

In the presence of a discount factor /i < 1, the nonlinear equation M can be solved by apply- 
ing the fixed point iterations which are called, in the optimal control and game literature, value 
iterations or the value iteration algorithm [10. The iterations of this method are cheap but their 
convergence slows considerably as the discount factor /i approaches one. Moreover, when we dis- 
cretize Equation (pi) with a finite difference or finite element method with a discretization step h, 
we obtain an equation of the form (IT]) with a discount factor n = 1 — 0{Xh^), then when h is small 
II is close to one and the value iteration method is as slow as the Jacobi or Gauss-Seidel iterations 
for a discretized linear elliptic equation. Another approach consists in the so called policy itera- 
tion algorithm, initially introduced by Howard |35] for one player stochastic games (i.e. stochastic 
control problems). Later adaptations of this algorithm were proposed for the two player games : 
by Hoffman and Karp |5S] for a special mean-payoff case, by Dernado [53] for approximations of 
value functions in discounted stochastic games, in Puri thesis [151 for discounted stochastic games, 
and by Cochet-Terrasson and Gaubert [TH] for the general mean-payoff case. In all cases, policy 
algorithm converges faster than the value iteration algorithm and in practice it ends in few steps 
(see for instance [24] for numerical examples in the case of deterministic games) . 

A (feedback) policy (or pure Markovian stationary strategy, see Section p] below) a : X ^ A 
for the first player is a function which maps any x d X to an action a £ A. Then, starting with 
an initial policy for player 1, the policy iteration algorithm for the two player zero-sum stochastic 
game consists in applying successively a policy evaluation step followed by a policy improvement 
step. The policy evaluation step amounts to compute the value of the game for the current policy 
a, that is the solution ?; of (IT]) where instead of taking the maximum of the expression inside 
the "max", one evaluates it with a = a{x). The policy improvement step consists in finding the 
optimal policy for the current value function v, that is the policy optimizing the expression inside 
the "max" in (JTl when the value function is v. Computing the above value functions (in the policy 
evaluation steps) is performed using the policy iteration algorithm for a one-player game. The 
policy iteration algorithm is explained in more general settings in Section jT] It stops after a finite 
number of steps when the sets of actions are finite, see [HJ [T31 \5U\ for one player games and [JHl [Sj 
for two player games. In addition, under regularity assumptions on the maps r and p, the policy 
iteration algorithm for a one player game with infinite action spaces is equivalent to Newton's 



method, thus can have a super-hnear convergence in the neighborhood of the solution, see [5T1 [TS] 
for superhnear convergence under general regularity assumptions, and |5ll [2l |5j for order p > 
superlinear convergence under additional regularity and strong convexity assumptions. 

Each policy iteration for a one player game (or each iteration in the inner loop of the two player 
algorithm) requires the solution of a linear system. Indeed, when we fix feedback policies a : X ^ A 
and /3 : A" — >■ i3 for player 1 and 2 respectively, the system of equations (II]) yields a linear system of 
the form : v = fxMv + r where v,r G R*^ are respectively the value function of the game and the 
vector of rewards for the fixed policies a and /3, < /i < 1 is the discount factor and M E R*^^"^ 
is a Markov matrix whose elements are the transition probabilities Mxy — piMVi Oii^), P{^)) G I^+ 
for x^y € X (and each rowsum of M equals one). When the dynamic programming equation M 
is coming from the discretization of an Isaacs partial differential equation (l2]), this linear system 
corresponds to the discretization of a linear elliptic partial differential equation, hence it may be 
solved in the best case in a time in the number of discretization points by using multigrid methods, 
that is the cardinality \X\ of the discretized state space X , or the size of the matrix M. For general 
stochastic games on a finite state space X , since M is a Markov matrix, the matrix (/ — /iAf ) of 
the linear system is an invertible M- matrix [13) . and one may expect the same complexity when 
solving them by using an algebraic multigrid method. 

In the present paper, we consider the combination of policy iterations with the algebraic multi- 
grid method (AMG) introduced by Brandt, McCormick and Ruge [El [18], see also Ruge and 
Stiiben [S3|. We shall call AMGtt the resulting algorithm. This algorithm can be applied either 
to a true finite state space zero-sum two player game or to the discretization of an Isaacs equa- 
tion, although in the present paper we restrict ourselves to numerical tests for the discretization 
of stochastic differential games, since the AMG algorithm needs some improvements to be ap- 
plied to arbitrary non symmetric linear systems arising in game problems. Such an association 
of multigrid methods with policy iteration has already been used and studied in the case of one 
player games, that is discounted stochastic control problems (see Hoppe [551 [57] and Akian [TJ [2] 
for Hamilton- Jacobi-Bellman equations or variational inequalities, Ziv and Shimkin [48' for AMG 
with learning methods). However, it is new in the case of two player games. We have implemented 
this algorithm (in C) and shall present numerical tests on discretizations of Isaacs or Hamilton- 
Jacobi-Bellman equations or variational inequalities, while comparing AMGtt with the combination 
of policy iterations with direct solvers. 

The complexity of two player zero-sum stochastic games is still unsettled, one only knows 
that it belongs to the complexity class of NPflcoNP [¥5^. Indeed, the number of policy iterations is 
bounded by the number of possible policies, which is exponential in the cardinality of A'. Friedmann 
has shown |34j that a strategy improvement algorithm requires an exponential number of iterations 
for a "worst" -case family of games called parity games, this result can be extended to other types of 
zero-sum stochastic games, in particular to mean-payoff and discounted zero-sum stochastic games, 
and to undiscounted stochastic control problems (one-player games) as shown by Fearnley [371 [2H] ■ 
However, as for Newton's algorithm, convergence can be improved by starting the policy iteration 
with a good initial guess, close to the solution. With this in mind, we present a full multi-level 
policy iteration, similar to FMG. It consists in solving the problem at each grid level by performing 
policy iterations until a convergence criterion is verified, then to interpolate the strategies and 
value to the next level, in order to initialize the policy iterations of the next level, until the finest 
level is attained. When at each level policy iterations are combined with the algebraic multigrid 
method, we shall call FAMGtt the resulting full multi-level policy iteration algorithm. For one- 
player discounted games with infinite number of actions and under regularity assumptions, one 
can show [51 [T] that this kind of full multi-level policy iteration has a computing time in the order 
of the cardinality \X\ of the discretized state space X at the finest level. In Section [6J we give 
numerical examples on variational inequalities for two player games, the computation time of which 
is improved substantially using FAMGtt instead of AMGtt. 

The paper is organized as follow. The three following sections are some recalls about basic 
definitions on the subject. In Section [S] we introduce the definition of a two player zero-sum 
stochastic game with finite state space and the corresponding dynamic programming equation. 
Section [3] is about two player zero-sum stochastic differential games, we recall here the definition 
of the Isaacs equation, the variational inequalities and the discretization scheme that we use. 
Section [4] is devoted to the numerical background needed to solve the dynamic programming 



equation, including the policy iteration algorithm and the algebraic multigrid method. Section [5] 
describes our algorithms AMGtt and FAMGtt. We present in Section [6] some numerical tests on 
discretizations of Isaacs equations and variational inequalities. Last section gives ending remarks. 

2 Two player zero-sum stochastic games: the discrete case 

The class of two player zero-sum stochastic game was first introduced by Shapley in the early fifties 
[51] . We recall in this section the definition of these games in the case of finite state space and 
discrete time (for more details see [511 US ES] ) ■ 

We consider a finite state space X = {1, . . . ,n}. A stochastic process {£.k)k>o o^ ^ gives the 
state of the game at each point time k, called stage. At each of these stages, both players have the 
possibility to influence the course of the game. 

The stochastic game r(a:o) starting from xq G A" is played in stages as follows. The initial state 
^0 is equal to xq and known by the players. The player who plays first, say max, chooses an action 
Co in a set of possible actions ^(^o)- Then the second player, called MIN chooses an action 770 in a 
set of possible actions B{£,o,Co)- The actions of both players and the current state determine the 
payment r{^Q, (^q, 770) made by MiN to MAX and the probability distribution p(-|^0: Coi Vo) of the new 
state ^1- Then the game continues in the same way with state ^1 and so on. 

At a stage fc, each player chooses an action knowing the history defined by ifc = (Coj Coj%) • ■ ■ I'Cfc-i: Cfc-ii Vk-iiCk 
for MAX and {Lk,£.k) for MiN. We call a strategy or policy for a player, a rule which tells him the 
action to choose at any stage and in any situation. There are several classes of strategies. Assume 
A{x) C A and B{x, a) C B for some sets A and B. A behavior or randomized strategy for MAX 
(resp. MIn) is a sequence a := (ao, ai, • • • ) (resp. j3 :— (/3o, /3i, • • • )) where ak (resp. /3fe) is a map 
which to a history hk = {xo,ao,bo,. . . ,Xk-.i,ak-i,bk-i,Xk) with Xi e X, ai <E A{xi), h € B(xi,ai) 
for < i < fc (resp. {hk,ak)) at stage k associates a probability distribution on a probability space 
over A (resp. B) which support is included in the possible actions space A{xk) (resp. B{xk,ak))- 
A Markovian (or feedback) strategy is a strategy which only depends on the information of the 
current stage k: a^ (resp. 13^) depends only on Xk (resp. (xfe,afe)), then ak{hk) (resp. PkihkiO-k)) 
will be denoted ak{xk) (resp. j3k{xk,ak))- It is said stationary if it is independent of fc, then ak is 
also denoted by a and /3k by (3. A strategy of any type is said pure if for any stage k, the values 
of ak (resp. /3k) are Dirac probability measures at certain actions in A{xk) (resp. B{xk,ak)) then 
we denote also by ak (resp. /3fe) the map which to the history assigns the only possible action in 
A{xk) (resp. B{xk,ak))- 

In particular, if a is a pure Markovian stationary strategy, then a = {<Xk)k>o '^^^^ oik — ct for 
all k and a is a map X ^ A such that a{x) £ A{x) for all x € X . In this case, we also speak about 
pure Markovian stationary strategy for a and we denote by Am the set of such maps. We adopt a 
similar convention for player MiN : Bm :— {/3 : X x A ^ B\ /3{x, a) S B{x, a) Vx S A", a £ A{x)}. 

A strategy a = (afc)/j>o (resp. /3 = {/3k)k>o) together with an initial state determines stochastic 
processes (Cfc)fc>o for the actions of max, {Vk)k>o '^'-'^ ^^^ actions of min and {£,k)k>o ^or the states 
of the game such that 

-P(^fc+i =y\i'k^hkXk=a,i]k = b) = piy \ x, a, b) (3a) 

P(Cfc £A\ik^hk)^ ak{hk){A) (3b) 

P{rik eB\ik = hk, Cfe = a) = /3k{hk,a){B) (3c) 

where Lk := (Coi C07 ^0, ■ • ■ , Cfe-ii Cfc-i; Vk-i^k) is the history process, hk is a history vector at time 
k: hk — {xQ,ao,bQ, . . . ,Xk-ijak-i,bk-i,x) and A (resp. B) are measurable sets in A{x) {B{x,a) 
resp.). For instance, for each pair of pure Markovian stationary strategies {a, /3) of the two players, 
that is such that for fc > : ak = a with a € Am and /3k = /3 with f3 G Bm, the state process 
(?fc)fc>o ^^ ^ Markov chain on X with transition probability 

Pi^k+i ^y\^k^x) = p{y\x, a{x),/3{x, a{x))) for x,y e X 

and Cfe = a{£_k) and ?]k = /3{^k, Cfe)- 

The payoff of the game r(xo) starting from xq G A" is the expected sum of the rewards at all 
steps of the game that max wants to maximize and min to minimize. In this paper we consider 



discounted games F^ with discount factor < fj, < 1: the reward at time k is the payment made 
by MIN to MAX times ^'^ . When the strategies a for MAX and /3 for MIN are fixed, the payoff of the 
game r^(xo,a,/3) starting from xq is then 



J{xo,a,/3) 



= E?:'^ 



A;=0 



/?-(6,Cfe,»7fc) 



where E"^^ denotes the expectation for the probability law determined by (|3| . A discounted game 
can be seen equivalently as a game which has, in each stage, a stopping probability equal to 1 — /i, 
independent of the actions taken by both players. The value of the game starting from xq e X, 
r^(a;o), is then given by 

v{xq) — supinf J(2;o, a, /3), (4) 

a P 

where the supremum is taken over all strategies a for MAX and the infimum is taken over all 
strategies /3 for min. Note that a non terminating game without any discount factor (or /i = 1) is 
called ergodic. 

We are concerned in finding optimal strategies for both players and the value of the discounted 
game F^ in each point. These are given by the dynamic programming equation [54 defined below. 

Theorem 2.1 (Dynamic programming equations |54|). Assume A{x) and B{x,a) are finite sets 
for all x £ X , a £ A{x). Then, the value v of the stochastic game F^, defined in m, is the unique 
solution V : X ^ H of the following dynamic programming equation: 



v{x) — max min > iip{y\x,a,b)v{y) + r{x,a,b) 11 Vcc G A". (5) 

aeA{x) \ beB(x.a) \ ^—' 



: F(v;x 



Moreover, optimal strategies are obtained for both players by taking in Q pure Markovian 
stationary strategies a for MAX and /3 for MIN such that for all x in X, a{x) attains the maximum 



© 



a{x) G argmaxF(w;x, a) 

aeA{x) 



where 



( 



F(v;x,a) := min 

beB{x,a) 



\ 



^fi p{y\x,a,b) v{y) + r{x,a,b) 
yex 



V 



(6) 



/ 



and for all x in X and a in A{x), /3{x,a) attains the minimum in (|6| 



/3(x) G argmin i^(w; x, a,6) . 

beB{x,a) 

Here we use the notation argmax^g(;; /(c) := {c £ C \ f{c) = maxc'gc' /(c')} and similarly for 
argmin. 

We denote by F the dynamic programming operator from R"^ to itself which maps v to the 

function 

F{v) : X ^ R 

X H> F{v;x) 



(7) 



where F{v;x) is defined in ([5]). This operator is monotone and contracting with constant fi in the 
sup-norm, i.e. ||i^('y) — F{v )||oo < fJ-\\v — v Hoc for all v,v £ R"^. Hence, fixed point iterations on 
Equation (Is]), called value iterations in the optimal control and game literature, are contracting for 
the sup-norm with constant /i. 



3 Two player zero-sum stochastic differential games: the 
continuous case 

Another class of games which we consider is the class of two player differential stochastic games 
in continuous time. In these games, the state space is a regular open subset X of R'' and the 
dynamics of the game is governed by a stochastic differential equation which is jointly controlled 
by two players (see [3T1, 13^1 and below). In this case, the value of the game (defined below) is 
solution of a non linear elliptic partial differential equation of type ([2| , called Isaacs equation (see 
also [3TJ[56]). The discretization of this equation with a monotone scheme in the sense of [Sj yields 
the dynamic programming equation ([5| of a stochastic game with discrete state space which was 
described in the previous section. 

In the first following subsection, we give the definitions of differential stochastic games with a 
bounded state space and a discounted payoff. Then, in the next subsection, we present a subclass 
of these differential games called optimal stopping time games. Finally, in the last subsection, we 
introduce the finite difference discretization scheme that we use to discretize the Isaacs equation ([T2|) 



and (131 respectively. Numerical examples of such kind of games will be presented in section kil 



3.1 Differential games with regular controls. 

Assume now that the state space is a regular open subset X of R''. Suppose a probability space 
ri is given, as well as a filtration (J-t)t>o over it (that is a non decreasing sequence of cr-algebras 
over fi). We consider games which dynamics is governed by the following stochastic differential 
equation : 

dit = giit,Ct,Vt)dt + ai^tXuVt)dWt, (8) 

with initial state ^o — x £ X. Here Wt is a d'-dimensional Wiener process on {^,{Tt)t>o)', Ct 
and % are stochastic processes taking values in closed subsets A and B of W and R'' respectively; 
{x,a,b) G X X Ax B 1-^ g{x, a, b) e R'' and X x AxB ^ a{x, a, b) S R'^^'*' are given functions. 
The dimension d! of the Wiener process may be different from d and is given by the modeling of 
the problem. Assuming that C,t and r^t are adapted to the filtration {J^t)t>Q (that is for all fc > 0, 
C,t and rjt are J^t-measurable) , allows one to define the stochastic process ^t satisfying Equation (l8| 
and it is a necessary condition to the assumption that the actions of the two players depend only on 
the past states and actions. We also consider strategies a = {at)t>o (resp. P = {(it)t>o) of player 
MAX (resp. MIN) determining the process (Ct)t>o (resp. (r?t)t>o)- In particular, for pure Markovian 
stationary strategies, one has Q = a(^t) and Ct — /3(Ct, Ct)- 

When X = R"^, the discounted payoff of the game with discount rate A > is given by : 



J(x;a,/3) = E^'*^ 



-At 



'r{£,t,C,t-,-nt)dt\£,^ ^x 



(9) 



where {x,a,h) £ X x Ax B ^^ r{x,a,b) e R is the (instantaneous, or running) reward function. 
Now, we consider that A" is a regular open subset X of R''. In this case, we denote by t the first 
exit time of the process {£,t)t>o from A", i.e. t = inf {i > 0|^t <^ A"}. Then, the discounted payoff of 
the game stopped at the boundary is : 



J(a;;a,/3) = E; 



— lI7"./9 



e"^V(6, Ct, m) dt + e-^^M^r) \^o^x 



(10) 



where the function x £ dX -^ i'lix) G R is called the terminal reward. The value function of the 
differential stochastic game starting from x is defined as in section [2] by 

v{x) = sup inf J{x\a,(3) (11) 

a /3 

where the supremum is taken over all strategies a for max and the infimum is taken over all 
strategies /3 for min. 

As previously, we are interested in finding the value function of the game and the correspond- 
ing optimal strategies. We denote by L(w; a;, a, b) the following second order partial differential 



operator : 



L{v; X, a, b) 



i,j = i 



qi]ix,a,b) 



d'^v{x) 
dxidxj 



^gj{x,a,b)^ Aw(a;) 



dxi 



with {qij)ij=i...,d — -zcya . When d! > d and cr(a;,a, 6) is onto for all x G A", a G .4, b <^ B, 
the matrix q{x, a, b) is of full rank and the operator L is elliptic. The value of the game v is 
solution, under some regularity assumptions on Q, and on the functions g, a, r and V' (for instance 
boundedness and uniform Lipschitz continuity), of the dynamic programming equation, called 
Isaacs partial differential equation : 



max min (Liy; x, a, b) + r(x, a,b)) ] =0 for x £ X 



(12) 



v{x) = -01 (a;) 



for X £ dX. 



This has been shown in the viscosity sense in ^T] . See also [2Dj and references therein for uniqueness 



of the solution of ( 12 1. If the value v of the game is a classical solution of ( 12 ), a and j3 are strategies 
such that for all a; in A" and a in A{x), a{x) and (3{x,a) are the unique actions that realize the 



maximum and the minimum in Equation ( 12 1 for max and MiN respectively, then a and /3 are pure 
Markovian stationary strategies, that are optimal for (111 (with S.Xi'n satisfying (Is]), (10), with 
Ct = a(6) and rjt = /3(Ct, Ct))- 



Note that for a game with one player, i.e. for a stochastic control problem. Equation ( 12 ) 



is the so-called Hamilton- Jacobi-Bellman equation. Also when X is bounded, and L is strongly 
uniformely elliptic (if for some c > 0, q{x,a,b) > cl for all a; G A", a E A, b E B), then the case 
A = can also be considered. 



3.2 Differential games with optimal stopping control 

When the action {QiVt) of the players are not continuous or not bounded, the dynamic program- 
ming equation of the game is no more of the form of Equation ( |12[ ), but may be a variational 
inequality or a quasi- variational inequality, see for instance [331 lllj for the case of optimal stopping 
games with one or two players and |301 I12j for impulse or singular control. 

We consider here an optimal stopping game, that is a game in which one of the players have 
the choice of stopping the game at any moment (see [33j for a more general case). We assume here 
that MAX has this ability. Then at each time t, he chooses to stop or not the game, that is he is 
choosing an element of the action space {0, 1} where 1 means that the game is continuing, that 
the game stops, with Cs = and ^^ = ^j for s > i when Ct = (i.e. g{x,0,b) = 0, cr(a;,0, 6) = 
Wb E B,x E X in (Is])). The second player min plays as previously and we consider the same model 
as in previous subsection. The value of a strategy a for max determines a process (Ct)i>o adapted 
to the filtration of (i^t)t>o (that is {(y{£,t))t>o), then a stopping time k = inf {i > 0|Ct — 0} adapted 
to the process {S,t)t>o and vice versa. 

So if r{x, 0, b) = Xip2{x) V6 E B, the discounted payoff ( 10 ) can be written as a function of the 
stopping time k instead of a : 



J{x;k,P) = E 



K,/3 



-\k 



At„ 



e-^V(^t, 1, r^t) dt + e-^^i^2{U) ^n<r + e-'^"Vi(C.) t.^r 



Co 



Indeed, if k < r, then S,s ^ S,k. ^ X , s > n, so t = +oo, and /J e ^'^r{£^tXt,Vt)dt = e '^''V'2('?k)- 



The value function ( 11 ) of the game starting from x is then given by 



v{x) — sup inf J{x]K,P) 

where the supremum is taken over all stopping times k < t and the infimum is taken over all 
strategies f3 for min. 



Since the variable "a" appears only when equal to 1, one can onimit it in equations, hence 



Equation (12 1 becomes 



max < min {L{v;x,b) + r{x,b)) , X{ip2{x) — v{x)) 
beB ^ „ / 



for X in A", 



(13) 



v{x) = ijji{x) for X e dX^ 
since A > 0, one can divide the term Q) by A, and get the variational inequality in the usual form 



used in viscosity solutions literature. In another usual way, Equation ( 13 1 can be written as 



for X £ X 



min ( L(v: x, b) + r(x, b)) < 

beB 

4'2{x) — v(x) < 

( min ( L{v; x, b) + r{x, b) ) I (^'2(2;) — v{x)) — 
\beB J 



(14) 



with v{x) = "01(2;) for ^ ^ (^'^- Both Equation (131 and Equation (14) are called variational 



inequalities. Note however, that Equation (13), or the resulting equation obtained by simplifying 
by A in (2), reveals more the control nature and can be used to define viscosity solutions (where one 
need to write equations in the form F{x,v{x),Dv{x),D^v{x)) = on X), whereas Equation (14) 
is more adapted to a variational approach. 



As for (12), if w is a classical solution of (13) or (14), if for all x in X: a{x) is equal to 1 or 
if resp. or (J) is maximum in (13) and if for all x in X: /3{x, 1) is the action b Q B which 



realize the minimum in Q, then an optimal pure Markovian stationary strategy is obtained by 
taking ijt = /3(^t,l) and k equal to the first time when a{^t) = 0. So this equation behaves as 



Equation (12) but where the first player has a discrete action space equal to {0,1}, 1 meaning 



continue to play and meaning stop the game. This variational inequality can be treated with the 



same methods as (12 1 



3.3 Discretization 



Several discretization methods may transform equations ( 12 ) or ( 13 ) into a dynamic programming 
equation of the form (Isl). This is the case when using Markov discrezation of the diffusion's ( [l2| ) 
as in [39l HOj and in general when using discrezation schemes that are monotone in the sense of [8] . 
One can obtain such discretizations by using the simple finite difference scheme below when there 
are no mixed derivative (that is aa^ is a diagonal matrix). Under less restrictive assumptions on 
the coefficients, finite difference schemes with larger stencil also lead to monotone schemes p!51fi5] . 
In the deterministic case (when a = 0), one can also use semi-Lagrangian scheme [51 [7] or max-plus 
finite element method [3], both of them having the property of leading to a discrete equation of 
the form (Isl). 

We suppose that X is the rf-dimensional open unit cube. Let h — — (m G W*) denote the 
finite difference step in each coordinate direction, e^ the unit vector in the ^'''-coordinate direction, 
and X = {xi, . . . , xj) a point of the uniform grid Xh = X (hX)'^. Equation ( 12 ) is discretized by 
replacing the first and second order derivatives of v by the following approximation, for i = I, . . . ,d : 



or 



dv{x) v{x + hci) — v{x — hci) 
~dxi 2h 

v{x + hei) — v{x) 



dv{x) 
dxi 



v{x) — v{x — hci) 



when gi{x,a, b) > 



when gi{x,a,b) < 0. 



d^v v{x + hci) ~ 2v{x) + v{x — hei) 



(15) 



(16) 



(17) 



Approximation ( 15 ) may be used when L is uniformly elliptic and h is small, whereas ( 16 ) has 



to be used when L is degenerate (see [SHI HOI)- For equations (12 1 and (13), these differences 



are computed in the entire grid A/,,, by prolonging v on the "boundary", dXh := 9A'n(ft.Z)'^using 
Dirichlet boundary condition: 

v{x) = ^{x) yxedxnihz)'^. 

We obtain a system of N^ non hnear equations of N^ unknowns, the values of the function 

Vh ■■ X e Xh i^ Vh{x) e K, : 

max(min (L^(w;i; (x, a, 6)) + r(x, a, 6) ) ) =0 Va; G A^/i , (18) 

where Nh = 'iX^ ~ 1/h'^ and Lh is a function which to w € R'^'', x G Xh, a G A, b £ B associates 
the approximation of L(v; x, a, b). 

When there are no mixed derivatives {qi,j{x, a,b) — ii i y^ j, i, j G {1, . . . , d}), the discretiza- 
tion is monotone in the sense of [S], then if ([l2|) has a unique viscosity solution, the solution v^ 



of (18) converges uniformly to the solution w of (12) [8 . Moreover, multiplying Equation ( [18| by 
ch^ with c small enough, it can be rewritten in the form (l5|, with a discount factor /x = l — 0{Xch^). 



A similar result holds for the discretization of (13) (by multiplying only the diffusion part by ch ) 



We refer to section 6.1 for an example of an Issacs equation (23) whose discretization (using 
scheme (16)- (17)) yields an equation (24) which has the form of (5^ 



4 Background for numerical solution of discrete dynamic 
programming equations 

In this section, we present the policy iteration algorithm to solve the dynamic programming equa- 
tion (|5]) of a two player zero-sum discounted stochastic game with finite state space. We first 
present the policy iteration algorithm for a one player game which is then used in the following 
subsection to define the policy iteration algorithm for the two player case. The last part of this 
section is devoted to a recall of multigrid methods which we will use in the policy iterations for 
solving the linear systems. 

4.1 Policy iteration algorithm for one player games 

First, we consider a one player stochastic game with a min player and finite state space X. In this 
case, the dynamic programming operator F, mapping R" to itself, is given for each x G A" by : 

F{v;x) = min \ ^2 fJ. p{y\x,b) v{y) + r{x,b)\ . (19) 

'^"'^^^ \^x J 

This game is more commonly called a Markov Decision Process (MDP) with finite state space X, 
we refer to [38l [221 EO] for a deeper description on this topic. Then, the discounted value of the 
game starting in x G A" is given by : 



v{x) = infE^ 



^/r(Cfc,77fc) 



fc=0 



where the processes £,k, Vk and strategies /3 are defined such as in the section [2J The value v of the 
game is solution of the dynamic programming equation : v{x) — F{v]x) for x in X. Then the 
policy iteration algorithm for Markov Decision Processes, that was first introduce by Howard [38], 
is given in Algorithm [l] and give us the discounted value of the game u : A" — > R and the optimal 
policy for MIN. 

Each policy iteration of Algorithm [l] strictly improves the current policy and produces a non 
increasing sequence of values {v'')k>i- It implies that the algorithm never visits twice the same 
policy. Hence if the action sets are finite in each point of X, the policy iterations stop after 
a finite time (see for instance [5TJ HI] [T3] ) . Moreover, under regularity assumptions, the policy 
iteration algorithm for a one player game with infinite action spaces is equivalent to Newton's 



Algorithm 1 Policy iteration algorithm for Markov Decision Processes (one player game) 

Given an initial policy /3^ G Bm, the policy iterations consist in applying successively the two 
following steps: 

1. Compute the value w'^+^ of the game with fixed feedback policy (5'', that is the solution of 

v^+^x) = J2f^P(y\^^(3'i^)) Hy) + r{x,p\x)) (20) 

yex 

2. Improve the policy: Find the optimal feedback policy (3^^^ for the value w'^"'"^, i.e. for each x 
in X, chose /3'^+^(a;) such that : 

/3''+^(x) e argmin \ ^iip{y\x,b)v''+'^{y)+r{x,b) 

until we cannot improve the policy anymore. 

method [21 IS [El [51] . Indeed, define G{v) = F{v) — v, then the problem is to find the solution of 
G{v) = where all entries of G are concave functions. The policy improvement step can be seen as 
the computation of an element of the sup-differential of G in the current approximation w'^^^ and 
the value improvement step computes the zero of the previous sup-differential. When G is regular, 
the sequence of value functions {v'^)k>i is exactly the sequence of the Newton's algorithm. 

4.2 Policy iteration algorithm for two player games 

Now, we give the policy iteration algorithm for solving a two player zero-sum stochastic games with 
finite state space X, as defined in Puri thesis |49]. Recall the definitions of section [2J we need to 
solve the dynamic programming Equation (|5| which give us the value of the game (Equation Q) 
and the optimal strategies for both players. For a fixed pure feedback policy for MAX a € Am, the 
value V of the game is solution of the equation v = F"{v) where F" is an operator mapping R" to 
itself whose x-coordinate is given by : 

F°'{v;x) :— F{v;x,a{x)) = min > ^ p{y\x,a{x),b) v{y) + r{x,a{x),b) 

bel3{x,a(x)) I ^^ 

for each x E X and v E R". Note that F°' is the dynamic programming operator of a one player 
game with only the MiN player. Then the policy iteration algorithm is given in Algorithm [2| 

Step[T]of Algorithm[2]is performed by using the policy iteration algorithm for a one player game. 
That is, given an initial feedback policy for min /3'''° G Bm, we iterate on min policies /3'*''^ € Bm 
and value functions ii"'*^. Then at each step k of the interior policy iteration (Algorithm 111 step nj) , 
one computes v"'''^^ , the value of the game with fixed strategies a'^ S Am for MAX and /?'*'''' G Bm 
for MIN. This is done by solving the linear system : 



v''''+^ = /iM"°'^"%*''=+i + r""'^"" , (21) 

where for all a £ Am, P & Bm'- M"^ G R"-^" is a stochastic matrix whose elements are defined 
by (M"^)^; J, = p{y\x,a{x),f5{x)) for all x^y € X and r"^ G R" is the vector whose elements are 
defined by (r"^)^ = r{x,a{x)l3{x)) for x E X. 

As for the one player case, each iteration of the policy iteration algorithm strictly improve the 
current policy, hence it can never visit twice the same policy. Moreover, the algorithm produces a 
non decreasing (resp. non increasing) sequence of values (w*)s>i (resp. {v^'^)k>i) of the external 
loop (resp. internal loop), see [1^119) . It follows that if the action sets for both players are finite 
in each point of A", the policy iterations stop after a finite time [49j . 
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Algorithm 2 Policy Iteration 

Given an initial policy a° € Am for max, the policy iterations consist in applying successively the 
two following steps: 

1. Compute the value w*+^ of the game with fixed feedback policy a*, that is the solution of 

by using Algorithm [l] 

2. Improve the policy: Find the optimal feedback policy a'^'^^ of max for the value w"^^ , i.e. 
for each x in X, chose a'^~^^{x) such that : 

a^'^^{x) e argmax F(u*+"^;a;,a) 

aeA{x) 

where F{v;x,a) is defined by ^. 
until we cannot improve the policy anymore. 

4.3 AMG 



The linear systems defined in ( 21 ) have all the form v = jiAIv + r where M a Markov matrix. We 
solve them using algebraic multigrid methods which we recall in this section. 

Standard multigrid was originally created in the seventies to solve efficiently linear elliptic 
partial differential equations (see for instance [42]). It works as follows. Multigrid methods require 
discretizations of the given continuous equation on a sequence of grids. Each of them, starting from 
a coarse grid, being a refinement of the previous until a given accuracy is attained. The size of the 
coarsest grid is chosen such that the cost of solving the problem on it is cheap. Assume also that 
transfer operators between these grids are given: interpolation and restriction. Then, a multigrid 
cycle on the finest grid consists in : first, the application of a smoother on the finest grid; then a 
restriction of the residual on the next coarse grid; then solving the residual problem on this coarse 
grid using the same multigrid scheme; then, interpolate this solution (which is an approximation 
of the error) and correct the error on the fine grid; finally, the application of a smoother on the 
finest grid. If the multigrid components are properly chosen, this process is efficient to find the 
solution on the finest grid. Indeed, in general the relaxation process is smoothing the error which 
then can be well approximated by elements in the range of the interpolation. It implies, in good 
cases, that the contraction factor of the multigrid method is independent of the discretization step 
and also the complexity is in the order of the number of discretization points. We shall refer to 
this standard method as geometric multigrid. 

Algebraic multigrid method, called AMG, has been initially developed in the early eighties (see 
for example |18|, I17[ I53j ) for solving large sparse linear systems arising from the discretization of 
partial differential equations with unstructured grids or PDE's not suitable for the application of 
the geometric multigrid solver or large discrete problems not derived from any continuous problem. 

The AMG method consists of two phases, called "setup phase" and "solving phase". In contrast 
to geometric multigrids, the mode of constructing the coarse levels (coarse "grids") which consti- 
tute the setup phase, is based only on the algebraic equations. The points of the fine grids are 
represented by the variables and coarse grids by subset of these variables. The selection of those 
coarse variables and the construction of the transfer operators between levels are done in such a 
way that the range of the interpolation approximates the errors not reduced by a given relaxation 
scheme. Then the "solving phase" is performed in the same way as a geometric multigrid method 
and consists of the application of a smoother and a correction of the error by a coarse grid solution. 
The whole process is briefly recall below. 

Consider a system of n linear equations given in the matrix form: 

Av = f (22) 

where the matrix A £ R"^" and the vector / R" are given, and we are looking for the vector 

11 



V g R". We call fine grid 0° the set of all variables of the system, i.e. ff' = {I, . . . , n}. 
First, recall that a relaxation method consist of the following approximations: 

u ^r- Su + 5*0/ with S = I — SqA 

where S is called the smoothing operator and / is the identity operator in R"^". The error e = u—v 
propagates as 

e <— Se. 

The method is said to converge if p{S) < 1 where p{S) = max^ |Ai| is the spectral radius of S 
with Xi his eigenvalues. For example, the smoother operator of the weighted Jacobi method is 
S = I — wD~^A and that of the Gauss-Seidel is S" = / — L~^A where D and L are the diagonal 
and lower triangular part of the matrix A resp. 

Assume i7' the grid on level I where level correspond to the finest grid 51°. The construction 
of the coarse grid Sl'+^ from the fine grid Jl', consists in the splitting of the ni variables from the 
grid fl' into two distinct subsets, namely C which contains the variables belonging to both grids, 
n' and 51'+^, and F the variables belonging to the grid fi' only. We have then O' — C U F. The 
coarse grid 51'+^ — C contains n;+i variables. This splitting is based on the "connections" between 
the variables on level I [ISI [55] and such as the range of the associate interpolation or prolongation 
operator Pl_^_i accurately approximates the errors not efficiently reduced by the relaxation phase 
(these errors are "smooth" in the algebraic multigrid terminology). The restriction operator 7?.^'''^ 
maps residuals from grid fi' to the grid ri'+^. In [TH1IS3], the operator is fixed to be 'R-\'^^ = {V\^i)'^ . 
The coarse grid operator is defined by A'+^ = Ti}^^^ A^VIj^i where A'"*"^ is the approximation of A^ 
on 51'+^ and A^ = A. Similarly, for any vector w' e R"' wc denote w'"*"^ — TZi^^v^ its restriction 
on 51'+^. This construction can be repeated recursively from the finest level ^ = to the coarsest 
level L. 

The solution phase consists in applying the multigrid cycle described in Algorithm [3j it is called 
V(i^i,i^2)-cycle if 7 = 1 and W(j^i,t^2)-cycle if 7 = 2. Convergence theorem for the V-cycle is given 



Algorithm 3 Muhigrid scheme u' ^ MG{u\ f) 



M I < L then 
pre relaxation : 

v} ^r- Su'- + 5*0/' (on ri') vi times 

coarse grid correction : 

f+i ^n'l+^f^A'u') 

u'+i ^ 



u 



'+1 .- T\rn(„i+'^ fi+i 



MG{u'+\f+^) 7 times 



u' ^ u' + T'j+iu'+i 
post relaxation : 

u' ^ Su' + Sof (on Vt') V2 times 

else 

Solve A^u^ = f^ 
end if 



in|53j for A symmetric and positive definite. See also [HI Ull ES] , for two-level convergence for 
linear systems where the matrix of the system is a M-matrix, symmetric and positive definite. Also 
we can find in the literature, two-grid convergence analysis for non-symmetric linear system in |47j 
and HI]. 

5 A multigrid algorithm for discrete dynamic programming 
equations 

5.1 Policy iteration combined with algebraic multigrid method (AMGvr) 
Recall that in the policy iteration algorithm for games at each step k of the interior policy itera- 



tion, we have to solve a linear system (21| which is of the form v = pMv -f r with M a Markov 

12 



PI external 



PI intern 






AMG 



,,s,k,0 



V 



s,fe+l,0 



Figure 1: Representation of the nested iterations of AMGtt. 



matrix and < /x < 1 the discount factor. Since (/ — /xM) are non singular M-matrices, we use 
AMG to solve those systems. For shortness in the sequel, we shall call the resulting algorithm 
AMGtt that is the combination of policy iterations and AMG. The name AMGtt refers also to the 
numerical implementation of this algorithm. Note that in practice, in Algorithm nl (equivalently 
in Algorithm^, the policy iterations are stopped when after Step 111 the norm of the residual, 
Ti, — F{v) — V, is smaller than a given value denoted by e. We used this stopping criterion in 
AMGtt. The iterations of AMGtt are summarized in the scheme represented in Figure [l] where 



{v 



s,k,0 



,v 



s,fc+l,0 



) is a sequence of value functions generated by the multigrid solver. 



The algebraic multigrid methods allows us to solve linear systems arising from either the discretiza- 
tion of Isaacs or Hamilton- Jacobi-Bellman equations or a true finite state space zero-sum two player 
game. However in the present paper, we restrict ourselves to numerical tests for the discretization 
of stochastic differential games, since the AMG algorithm needs some improvements to be applied 
to arbitrary non symmetric linear systems arising in game problems. 

In the one player game case, convergence results of combination of policy iteration and geometric 
multigrid method have been established by Hoppe [3S1 [37] and Akian [I] [5] . 

5.2 Full multi-level policy iteration (FAMGtt) 

Recall that the number of policy iterations can be exponential in the cardinality of the state 
space X. However, as for Newton's algorithm, convergence can be improved by starting the policy 
iterations with a good initial guess, close to the solution. With this in mind, we present a full 
multi-level scheme, that we shall call FAMGtt. As in standard FMG, starting from the coarsest 
level, it consists in solving the problem at each grid level by performing policy iterations AMGtt 
until a convergence criterion is verified, then to interpolate the strategies and value function to the 
next level, in order to initialize the policy iterations of that level. This scheme is repeated until 
the finest level is attained. 



The algorithm FAMGtt only applies to Isaacs partial differential equations (12). It works as 
follows. The state space X is first discretized on sequence of Lp + 1 grids : Xlj, C • • • C .^i C 
Xq = Xh such that on grid Xi, < I < Lp, the discretization step is hi — 2'/i, where h is the 
discretization step chosen on the finest grid X^- Then, the Isaacs PDE is discretized on all levels, 
< I < Lp, using the finite differences scheme (16)- (17 1. For level I, we denote by Fi : Xi —^ Xi 
the dynamic programming operator, (u)' : A"; — >■ R the value of game, a; e A"; — ?► (a)' (a;) e A{x) 
and [x e Xi,a £ A{x)) — >■ (/3)'(x,a) e B{x,a) the strategies of max and min respectively. We 



denote by I^ the linear interpolation operator which maps any vector (v) from IR ' to II' 



>Ar,_ 



^'-l/„.^' 



{vYix) 



{vYix) xeXi 

EaeAAM jwy («)'(y) x£Xi^i\Xi 



where N{x) = {y G A"/ 



i\\\x- 



yW^ <= hi} for X G Xi^i \ Xi, and we denote by ^ the operator 
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Interpolation of strategies and value 
AMGvr 




A-o 



A-i 



X' 



X^ 



Figure 2: FAMGtt with AMGtt V-cycles 



which interpolates a strategy from grid Xi to grid <%";_!, for instance for a strategy of max : 

' ^^ ' ''^ 1 ao e A{x) X e Xi^, \ Xi 

where oq is chosen arbitrary A{x) in for x e A'/_i\ A/. We denote by AMG7r(a, /?, u, e) the algorithm 
AMGtt with initial strategy a for player MAX iterations, initial policy /3 for the first iteration of 
player MIN, value v as initial approximation for the first call of AMG and e the stopping criterion 
for the policy iterations. Then FAMGtt algorithm is given in Algorithm |4] where c > is a given 
constant. 

Algorithm 4 FAMGtt 

Given an initial (a°)^^, {(3'^)^'' and {v^)^i on level Lp, 
for I = Lp to 1 do 

{{a)\ (/3)', {vY) ^ AMG7T((a")', (/^O)', (v°)',c/i2) on level I 



{v') 



0\l-l 



rl-l 

-I 

,1-1, 



{vY 



(ay-' =Ur'iaY and (/3")'" -^ (/3)' 
end for 
solve V = F{v) on Xh by using AMG7T((a")", (^")°, {v°f, e) 



Figure [2] illustrates the FAMGtt algorithm when V-cycles are use in AMGtt. The dashed lines 
represent the interpolation of the solution and strategies from a coarse grid A"' to the next fine 
grid X^~^ . The continuous V-lines are the V-cycles of AMGtt which are not fixed in number since 
at each level, AMGtt cycles are performed until a given criterion is attained. 

Note that our FAMGtt program only applies to stochastic differential games since for them 
coarse representation, including equations and strategies, can be easily constructed by tacking 
different sizes of discretization step. 

For one-player discounted games with infinite number of actions and under regularity and strong 
convexity assumptions, it is shown in [2^ that this kind of full multi-level policy iteration has a 
computing time in the order of the cardinality of X . 

6 Numerical results 

In this section, we apply our programs AMGtt and FAMGtt, which were implemented in C, to 
examples of two player zero-sum stochastic differential games. Let first give some details about 
the implementation of the algorithms that we use and some notations for the numerical results. 
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Figure 3: Graph of sin(a;i) x sin(a;2) on A" = [0, 1] x [0, 1]. 



The AMG hnear solver of AMGtt implements the construction phase, including the coarsing 
scheme and the interpolation operator, described in |53j and the general recursive multigrid cycle 
for the solution phase (see Algorithm Isl) . In the tests, W(l,l)-cycles were used and the chosen 
smoother is a CF relaxation method, that is a Gauss Seidel relaxation scheme that relaxes first on 
C-points and then on F-points. The AMGtt program is the implementation of the method explained 
in section [5] with the above AMG linear solver. The FAMGtt program is the implementation of 
Algorithm |4] 

The following notations are used in the tables: s denotes the iteration over max policies and 
kmax is the corresponding number of iterations for min policies, that is the number of linear 
systems solved at iteration s. The residual error of the game is denoted by r„ = P{'>^) ~ v and the 
exact error, when known, by e = F{v) — u where u is the discretized exact solution of the game. 
The infinite norm and discrete L2 norm are given for each of them. 



6.1 Isaacs equations 

The first example concern a diffusion problem where the value v : X 
of the following Isaacs PDE : 



R of the game is solution 



max mm 

aeA beB 



. w(a;) 



Au(a;) + (a • Vw(x)) - (6 • Vu(a;)) - \v{x) + 



+ f{x) =0 a; in A" , 
X in dX 



(23) 



where X —]{), l[x]0, 1[ is the unit square, ^ = {a G R^ | ||a||2 < l}, = R^, V-'i(a;i, 2^2) — sin(a;i) x 
sin(a;2) for (a;i,a;2) € dX, and f{x) = — (Au(a;) + ||Vw(a;)||2— 0.5 ||Vu(a;)||2 — Au(a;)) with u(a;i, 2:2) = 
sin(a;i) x sin(x2) for x — {xi, X2) € X . Note that the exact solution is v{xi, X2) = sin(xi) x sin(a;2) 
on A" = [0, 1] X [0, 1] and is represented in Figure pj Indeed, by convex duality (or computation of 
Fenchel-Legendre transformations [52]), we have that 



max a 

,,<l,aeE,'' 



1 2 1 2 

and - llwlL = max h ■ u ||6|L 



for aU u e R'^, 



ll"lli 



and b = u are optimal solutions in these equations. 



To solve Equation (23), we first discretize the domain [0, 1] x [0, 1] on a grid with m + 1 points in 

^ and we obtain a discrete space Xh with bound- 



each direction, i.e. with a discretization step h 
ary dXh- We denote by Xi = ih with i = 0, . . 



,m such that Xh = {{xi,Xj) |i,j € {1, 



,m 



1}} 



and dXh — {{xi, Xj) | i G {0, m} ,j G {0, . . . , m} or j G {0, m} , i G {0, . . . , to}}. Then, using the 
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discretization scheme (16 1- (17), Equation (23 1 becomes for {xi,Xj) e Xh : 

-Av{Xi,Xj) +v{xi+i,Xj) +v{Xt^i,Xj) +v{Xi,Xj + i) +v{xi,Xj^i) 



= max min 

(aiM2)eA{hi,b2)eB 



h^ 



+ (ai - bi) 
+ (02 - ^2) 



v{xi+l,XJ)-v{x^,Xj)\ f v{x^,Xj)~v{x,^i,Xj) 

Jl(ai-bi)>0 + (ai ~ Oij I il(aj_bj)<o 



i;(xi,Xj+i) ~v{xi,Xj) 
h 

fe? + &2 



I(a2-62)>0 + (a2 - ^2) 



v{Xi,Xj) - v{x^,XJ^l) 



I 



(02-62X0 



-Xv{Xi,Xj)^ ^f{Xi,Xj)\ , 

multiply by — , where c = A + h\ai — bi\ + h\a2 — b2\ > 0, and adding v{xi, Xj) on both sides, we 
obtain : 



v(xi,Xi) = max min IH A 

(ai,a2)e-4(6i,62)eB 



I ( - + -(ai -&i)I(ai-bi)>oj v{x,+i,Xj)+ (- - -(ai - &i)I(Qi-bi)<o ) v{x^-i,Xj) 

+ ( - + -(02 -&2)I(a2-62)>o] ^(XijXj + i) + f (a2 - ^2) I(a2-62)<0 ] ^^(a;i,Xj_i) 

for {x^,XJ) e Xh , (24) 



+ — ^Y^ + —fix,,x,) 



where v{xi,Xj) is replaced by 'ipi{xi,Xj) for {xi,Xj) £ dXh- This equation has the form of Equa- 
tion ([5]) with a discount factor /i equal to (1 + ^A)^^ < 1, transition probabilities from {x-i, Xj) € X^ 
to {x^i , a; ■' ) S A"/! are given by : 



1 h 



Pi{x^',XJ')\{x^,x■j),{al,a2),{bl,b2)) =- + -{ai-bi)l(a,-bi)>o if* =i + ^,j =j 



c c 

(ai -^i)I(ai-6i)<o if* =i-l,j =j 

- + -(a2-62)]I(a2-b,)>o if* =i,.i =.? + l 
c c 

(02-62)1(02-62x0 if* =*,i =i-i 

c c 

else 



(25) 



and the running cost is, for {xi,Xj) £ Xh 
r((xi,Xj),(ai, 02), (61,62)) = 



K" (bl + bl 



I / v^i 1 "^j } 



+ f ^(ai-6i)I(a^_b^)>oj A{xi+i,Xj)l(^^^^^^^^j(,gx^ 

~ ( ;^("l ~ bi)l(^ai^bi)<oj 1pl{Xi-l,Xj)l(^^._-^^^^f^9Xh 
+ ( ~("2 -62)I(a2-&2)>0 j 1pl{Xi,Xj + i)li^r,.^^.^^)(=ox„ 
(a2 -62)I(a2_62)<0 j V^l(Xi,Xj_i)I(2:.^a;^_j)gaA';, 



Note that when i,j e {2, . . . , rn — 2} the sum of the transition probabilities from (xi, Xj) to the 
points of Xh equals /i, when i or j is in {\,m — 1} this sum is strictly less than /i. Hence, the 
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Table 1: Numerical results for equation (23) on a 1025 x 1025 points grid 









Policy iteration with LU 




s 


kmax 


ll'f lloo 


Ik^lli, 


l|e|loo 


Ml. 


cpu time (s) 


1 


3 


8.51e-7 


5.96e-7 


4.47e - 2 


2.48e - 2 


1.40e + 2 


2 


2 


2.44e - 8 


6.16e-9 


1.84e - 4 


1.05e-4 


2.31e + 2 


3 


1 


7.38e - 13 


2.03e - 13 


4.13e-6 


2.16e-6 


2.77e + 2 



AMGtt 



s 


kmax 


MWoo 


r^) Ll, 


l|e|loo 


Ml. 


cpu time (s) 


1 


3 


8.51e-7 


5.96e-7 


4.47e - 2 


2.48e - 2 


2.65e + l 


2 


2 


2.44e - 8 


6.16e-9 


1.84e - 4 


1.05e-4 


4.59e + l 


3 


1 


7.92e - 13 


2.02e - 13 


4.13e - 6 


2.16e - 6 


5.56e + l 



1400 



1200 





1000 


c/i 




T3 




fl 




O 




o 

CD 


800 


c/: 








600 


p 




Oh 




o 


400 



200 



PI with LU (UMFPACK) 
PI with AMG 




200^ 



400^ 600^ 800^ 1000^ 1200^ 
number of discretization nodes 



1400^ 



Figure 4: Comparison between AMGtt versus policy iteration algorithm with a LU solver for solving 
equation ( 23 1 when increasing the size of the problem. 



matrix M"'^ in (21 1 is substochastic, and since it is irreducible, it has a spectral radius strictly 
less than one. So even when A = or equivalently /i = 1, the system (21 ) has an unique solution 
and the dynamic programing equations has also an unique solution. Hence, we shall take A = in 



the numerical tests. Note also that for this example, the matrices M"'^ in (21 1 are not symmetric 



but close to be symmetric when h is small, since the non-symmetric part correspond to the order 



one term in equation ( 24 1 and are dominated by order two terms when b is optimal in ( 24 ) . 

In tables [11 we present numerical results when equations (23 1 is discretized on a grid with 1025 
points in each direction, i.e. with a discretization step oi h — 1/2^°. The stopping criterion for 
the policy iterations is e = 10"^*^. The first table of [I] shows the results of the policy iteration 
algorithm with a direct solver LU (we used the package UMFPACK ^21j) and the second table ofll] 
the results of AMGtt. We observe that AMGtt solves the problem faster than the policy iterations 
with a direct solver. In both tables, we see that only three steps on max policies are needed (first 
column) and a total of six steps on min policies (second column) which involves the resolution of 
six linear systems. The small number of iterations is due to the fact that the solution is regular. 
In table [2I we show that the computation time is improved when applying FAMGtt with c = 0.1 
to the same example. In this case, the problem is solved in approximately 18s. 

In figure l4J we compare the policy iteration algorithm with a direct solver LU (UMFPACK |21j ) 
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Table 2: 
with c = 



Numerical results for Equation ( 23 ) on a 1025 x 1025 points grid, computed by FAMGtt 
10-^ 



s 


kmax 


Ik-lloo 


\Kh. 


l|e|ioo 


llelli. 


cpu time (s) 


points in each direction : 3, h 5.00e — 01 




2 


1.42e - 01 


1.42e-01 


1.07e-01 


1.07e-01 


« 1 




1 


2.34e - 03 


2.34e - 03 


2.45e - 04 


2.45e-04 


« 1 


points in each direction : 5, h 2.50e — 01 




2 


5.53e " 03 


2.84e - 03 


3.00e-03 


1.75e-03 


« 1 


points in each direction : 9, h 1.25e — 01 




2 


2.40e - 04 


l.lOe-04 


8.20e-04 


4.46e - 04 


« 1 


points in each direction : 17, h 6.25e — 02 




2 


3.18e-05 


7.83e - 06 


3.36e-04 


1.90e-04 


l.OOe - 02 


points in each direction : 33, h 3.12e — 02 




1 


5.89e - 04 


7.08e - 05 


5.05e - 04 


1.99e-04 


l.OOe - 02 


points in each direction : 65, h 1.56e — 02 




1 


1.69e - 04 


1.25e-05 


1.62e-04 


4.67e - 05 


4.00e - 02 


points in each direction : 129, h 7.81e — 03 




1 


4.28e - 05 


2.16e-06 


4.73e - 05 


1.21e-05 


1.80e-01 


points in each direction : 257, h 3.91e — 03 




1 


1.08e-05 


3.77e - 07 


1.31e - 05 


6.07e-06 


7.50e - 01 


points in each direction : 513, h 1.95e — 03 




1 


2.70e - 06 


6.61e-08 


7.29e - 06 


3.56e~06 


3.13e + 00 


points in each direction : 1025, h 9.77e - 04 




2 


1.23e-10 


8.13e-13 


4.16e-06 


2.17e-06 


1.85e + 01 



T3 

CD 
> 



>. 



CD 



^ 



14 



12 



10 




200-^ 



400^ 600^ 800^ 1000^ 1200^ 
number of discretization nodes 



1400^ 



Figure 5: Number of iterations on MIN policies (i.e the number of linear systems solved) for solving 
equation ( 23 1 when increasing the size of the problem corresponding to figure W] for both methods 
(AMGtt and policy iteration algorithm with LU). 
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s 


kmax 


AMG 


k'f oo 


MWl, 


l|e|L 


l|e|li. 


cpu time (s) 


1 


2 


5,4 


2.15e-04 


1.52e - 04 


4.45e - 02 


2.50e-02 


5.00e - 02 


2 


2 


4,3 


5.97e - 06 


1.59e-06 


2.36e - 04 


1.43e-04 


l.OOe-01 


3 


1 


3 


3.02e - 09 


7 Ale - 10 


6.49e - 05 


3.44e - 05 


1.30e-01 



Table 3: Numerical results with a 65 x 65 points grid, computed by AMGtt for equation (23 1 



AMG 



5,4 



4,3 
3 



5.40e - 05 



1.53e - 06 
4.08e - 10 



Ik.l 



3.80e - 05 



3.95e 
9.65e 



07 
IT 



4.46e - 02 



2.07e 
3.28e 



04 
1)5 



2.49e - 02 



1.23e-04 
1.72e-05 



cpu time (s) 



2.30e - 01 



4.30e 
5.40e 



01 

In 



Table 4: 


Numerical results with a 129 


X 129 points 


grid, computed by AMG' 


T for equation ( 




s 


kmax 


AMG 


r"" oo 


\Fv\\l,. 


llelL 


11 Ilia 


cpu time (s) 




1 


2 


5,4 


1.35e-05 


9.51e-06 


4.47e - 02 


2.49e-02 


1.06e + 00 




2 


2 


4,3 


3.86e - 07 


9.86e - 08 


1.94e - 04 


1.13e-04 


1.98e + 00 




3 


1 


3 


5.17e-ll 


1.22e-ll 


1.65e-05 


8.63e - 06 


2.49e + 00 



Table 5: Numerical results with a 257 x 257 points grid, computed by AMGtt for equation (23) 



kmax AMG 
^ 5^ 



4,3 



1 



3.39e - 06 



9.71e - 08 



6.26e - 12 



M\ 

2.38e - 06 



2.46e - 08 



1.55e-12 



4.47e - 02 



1.87e-04 



B.26e-06 



2.48e-02 



1.08e - 04 



4.31e-06 



cpu time (s) 
4.55e + 00 



8.28e + 00 



1.04e + 01 



Table 6: Numerical results with a 513 x 513 points grid, computed by AMGtt for equation (23^ 



kmax AMG 



5,4 
4,3 



8.48e - 07 
2.43e - 08 



7.40e - 13 



\K\ 



5.95e - 07 
6.15e-09 



2.02e - 13 



4.47e - 02 
1.83e-04 



4.13e-06 



2.48e-02 
1.05e-04 



2.16e-06 



cpu time (s) 



1.85e 
3.40e 



01 



4.27e + 01 



Table 7: Numerical results with a 1025 x 1025 points grid, computed by AMGtt for equation (23) 



s 


kmax 


AMG 


Ik.lloo 


k-u La 


llelloo 


llelL, 


cpu time (s) 


1 


2 


5,4 


2.12e-07 


1.49e - 07 


4.47e - 02 


2.48e - 02 


7.46e + 01 


2 


2 


4,3 


6.09e - 09 


1.54e - 09 


1.82e-04 


1.04e-04 


1.38e + 02 


3 


1 


3 


1.13e-13 


3.04e - 14 


2.07e - 06 


1.08e-06 


1.72e + 02 



Table 8: Numerical results with a 2049 x 2049 points grid, computed by AMGtt for equation (23) 



and AMGtt for solving equation ( 24 ) , when increasing by one the number of discretization points 



in each direction from m 
= 10-10 



= 5 to TTi = 1500. The stopping criterion for the policy iterations is 
e = lU "". In figure pi we represent the corresponding number of iterations on min policies, i.e 
the number of linear systems solved for each size of problem, this number is the same for both 
methods. We can see that the most part of the computation time for the resolution of the non- 
linear equation ( [24| ) is used to solved the linear systems involved in the policy iteration. We also 
remark that the computation time for AMGtt seems to grow linearly with the size of the problem. 
Each table Is] to Is] contains numerical results for Equation (23 1 discretized on grids with dis- 



cretization step h— ^,h— i^,h— i^,h— i^,h— ^^ and h = ^tt respectively. For these tests, 
the stopping criterion for the policy iterations is e = 0.001 h^ where h is the discretization step. 
The stopping criterion for the linear solver AMG is ||r||2 < 10^^^ where r is the residual for the 
linear system. For each line of the tables, the third column, named AMG, contains the number of 
iterations needed by AMG for solving each linear system {kmax systems per line). We can see that 
the number of iterations of AMG is independent of the size of the problem. Note that the norm 
of the error ||e|| decrease slowly when the grid becomes finer, this is because the exact solution 
(Figure ]3]) is smooth and a small number of points is sufficient to get a good approximation, also 
the non-linearity of the problem gives a worse approximation than one might expect in the linear 
case. But a smooth solution is generally more difficult for linear iterative solvers. 
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6.2 Optimal stopping game 

Next tests concern an optimal stopping time game where the value w : A" — > R of the game is 
solution of the variational inequality : 



max { min I 0.5Aw(a;) - (6 • Wv{x)) + Ul + f(^x) ] , tp2{x) - v{x) 



aeA beB 



^ v{x) = Vi(x) 



0) 



= X in A" 



X in dX 



(26) 



where X =]0, l[x]0, 1[ is the unit square, the sets A ~ {0, 1}, B = K,^, V'2(a;i,a;2) = for (a::i,a;2) £ 
X, for (a;i,a;2) e A" : 



f{xi,X2) 



(0.5Au(a;i, X2) - 0.5 |j Vu(a;i, xzJUa) if 2^2 > (a^i " 0.5)^ 

else , 



0.1 



0.5Am(xi, ^2) — 0.5 ||VM(a;i,a;2;ii2 
and for (xi, 0:2) G dX : V'lC^^i, 2^2) = w(xi, X2) where 



-u(a;i,a;2) = 



{X2 - {{Xi - 0.5)2 _|_ o.l))3 if X2 > (xi - 0.5)2 _^ Q ;l 

else . 



The definitions of the functions /, ipi and 'ip2 are chosen such that the function u, represented in 
Figure l6J is solution of ( 26 ) almost everywhere and such that the terms and (2) in Equation ( 26 1 
are non positive for all x G A" (this condition must hold for the variational inequality to be well- 
defined). This example leads to a free boundary problem for the actions of max. Indeed, the points 
of the state space A"^ can be divided in two parts, the points where max chooses action 1 (means 
continue to play) and the points where max chooses action (means that he stops the game). 
For (a;i,a;2) € Xh, the optimal strategy a for max is a{xi,X2) = 1 if a;2 > {xi — 0.5)^ + 0.1 and 
a{xi,X2) = else, for all (xi,X2) G X. 

As for the previous example, the domain X is discretized on a grid with to + 1 points in each 
direction, i.e. with a discretization step h = — and we obtain a discrete space X^ with boundary 



dXfi. Then, Equation (26) is discretized by using the discretization scheme (16)- (17 1. After, the 



equations and Q) are simplified separately by keeping equations (14) true. In this case, only 



equation is multiply by — with c an appropriate constant. After discretization, we obtain the 
following dynamic programming equation for a game with state space Xh : 



VyXi^ Xj 



mm 

{bub2)eB 



1 

2^ 



1 
2^ 



6i-Ibi<o ) v{xi+i,Xj) + 



&2-]l62<0 ) v{xi,Xj + i) + 



+ 



h^ bf + bj h^ 



I h . 

— -|-6i-If,i>o ) v{Xi-i,Xj) 



1 h . 

— +62-Ib2>0 ) V{X^,Xj-l) 



H fiXi,Xj), tp2{Xi,Xj) 



for {xi,Xj) e Xh 



with c = 2 -f h\bi\ + h\b2\ > and v{xi,Xj) = ilJi{xi,Xj) for {xi,Xj) € dXh- The same com- 
ments about non-symmetry and the discount factor in equation (24) hold here. That is A = or 
equivalently /i 



1. 



The numerical results are performed for Equation ( 26 ) when discretized on a grid with 1025 



points in each direction. In the domain A/,,, for a fixed strategy a of max, we represent a points x 
with a green color when a{x) — 1, that is where max decides to continue playing, and with a blue 
color when a{x) — 0, that is when max decides to stop the game. The optimal strategy for max is 
to have only green points above the red curve, X2 = {xi — 0.5)^ -|- 0.1, and only blue points under. 
We start the tests with a{x) = for all x £ X, that is with blue points in the whole domain. 

Numerical results with AMGtt are shown geometrically in Figure [7] where the strategies of 
max obtained after 100, 200, 300, 400, 500, 600 and 700 iterations are represented. We observe 
in Table [9] that AMGtt finds an approximation of the solution after 702 iterations and in about 
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Figure 6: Graph of the solution of equation (261 



Table 9: Numerical results for optimal stopping time game (26 1 with a 1025 x 1025 points grid, 
computed by AMGtt with e = 10"^''. 



s 


kmax 


Mw oo 


¥v\\l 


2 


l|e|loo 


llelli 


2 


cpu time (s) 


1 





3.645e - 


-01 


9.195e- 


-03 


7.243e - 


-01 


1.998e - 


-01 


1.790e-f 00 


2 


4 


1.497e- 


-01 


1.347e - 


-03 


3.782e - 


-01 


1.218e- 


-01 


1.376e-f 01 


3 


4 


1.094e - 


-01 


8.839e - 


-04 


3.767e - 


-01 


1.213e- 


-01 


2.492e + 01 




100 


3 


1.744e - 


-02 


4.444e - 


-05 


2.392e - 


-01 


8.016e- 


-02 


1.009e-f 03 




200 


3 


7.398e - 


-03 


1.879e- 


-05 


1.222e- 


-01 


3.996e - 


-02 


2.214e + 03 




300 


3 


2.510e- 


-03 


8.779e - 


-06 


5.614e - 


-02 


1.728e - 


-02 


3.619e-h03 




400 


2 


1.258e- 


-03 


4.363e - 


-06 


2.321e- 


-02 


6.519e - 


-03 


4.770e + 03 




500 


2 


4.761e- 


-04 


1.620e - 


-06 


6.601e- 


-03 


1.532e- 


-03 


5.861e-H03 




600 


2 


8.857e - 


-05 


2.781e- 


-07 


7.274e - 


-04 


9.598e - 


-05 


7.045e + 03 




650 


2 


1.533e- 


-05 


4.231e- 


-08 


1.538e - 


-04 


6.331e- 


-05 


7.630e -f 03 




700 


1 


5.647e - 


-08 


8.734e - 


-11 


1.571e- 


-04 


6.619e- 


-05 


8.134e-h03 


701 


1 


1.207e- 


-08 


2.267e- 


-11 


1.571e- 


-04 


6.619e - 


-05 


8.141e-h03 


702 


1 


9.992e - 


-16 


7.284e - 


-17 


1.571e- 


-04 


6.619e- 


-05 


8.148e-H03 
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(a) 



(c) 




(b) 



(d) 



(f) 



Figure 7: Application of AMGtt to the free boundary problem (261 for a 1025 x 1025 points grid 



(a) after 100 iterations, (b) after 200 iterations, (c) after 300 iterations, (d) after 400 iterations, 
(e) after 500 iterations, (f) after 600 iterations and (g) after 700 iterations. 
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Table 10: Numerical results for optimal stopping time game (261 with a 1025 x 1025 points grid, 
computed by FAMGtt with c = lO'^ and e = 10" i**. 

s I kmax I |KlL \ IM7^ \ NL \ Ml^ I cpu time (s) 



points in each direction : 3, step size : 5.00e — 01 



2.17e-01 2.17e-01 1.53e-01 1.53e-01 



2.64e - 05 



2.64e - 05 3.92e - 02 



3.92e-02 



« 1 



« 1 



points in each direction : 5, step size : 2.50e — 01 



2.19e-04 8.41e-05 3.02e - 02 1.71e-02 



« 1 



points in each direction 
4.99e-03 I 1.06e - 03 



2.68e - 03 



2.72e - 04 



9, step size : 
1.65e-02 I 7 



5.41e-04 1.66e-02 



5.49e-05 1.68e-02 



1.25e-01 
,99e - 03 



8.15e - 03 



8.30e - 03 



« 1 



« 1 



« 1 



points in each direction 
2.26e-03 I 5.44e - 04 



7.97e - 04 



4.65e - 04 



17, step size : 
8.75e-03 I 3, 



1.23e-04 



..84e - 03 



5.97e - 05 



..98e-03 



9.57e-08 | 1.24e - 08 
points in each direction 



9.01e-03 I 4. 
33, step size : 



6.25e-02 
89e - 03 



3.97e-03 



4.11e-03 



14e - 03 
3.12e-02 



« 1 



« 1 



« 1 



l.OOe - 02 



2.10e-04 1.90e-05 4.94e - 03 2.16e-03 



1.05e-04 



6.57e-06 4.76e - 03 



2.09e - 03 



l.OOe - 02 



2.00e - 02 



points in each direction : 65, step size : 1.56e — 02 



6.26e-05 6.43e - 06 2.49e - 03 1.07e-03 



3.64e - 05 



2.09e - 06 2.45e - 03 



1.05e-03 



4.00e - 02 



7.00e - 02 



points in each direction : 129, step size : 7.81e — 03 



7.67e-06 3.88e - 07 1.25e-03 5.33e - 04 1.60e - 01 



points in each direction : 257, step size : 3.91e — 03 



2.86e-06 1.12e - 07 6.28e - 04 2.66e - 04 6.20e - 01 



points in each direction 
5.33e-07 I 1.44e-08 



513, step size : 
3.15e-04 I 1 



1.95e-03 
33e - 04 



2.49e + 00 



points in each direction : 1025, step size : 9.77e — 04 



1.79e-07 



9.66e 
5.39e 



08 
1)8 



2.86e - 08 



7.41e - 09 



16 



3.82e-09 1.57e-04 



8.84e 
4.10e 



10 
IT 



1.57e-04 
1.57e-04 



1.31e-10 1.57e-04 



1.60e - 11 1.57e - 04 



7.31e-17 1.57e-04 



6.62e - 05 



62e - 05 
62e - 05 



6.62e-05 



6.62e-05 



6.62e-05 



1.58e + 01 



2.30e + 01 
3.00e + 01 



3.70e + 01 



4.34e + 01 



4.99e + 01 
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(c) 



(b) 




' (d) 



Figure 8: Application FAMGtt to the free boundary problem (26) for: (a) 9 x 9 points grid, (b) 



17 X 17 points grid, (c) 33 x 33 points grid, (d) 65 x 65 points grid. 



two hours and 15 minutes. The stopping criterion for policy iterations of AMGtt in this test is 
e — 10"'^''. This criterion was chosen to ensure the convergence of the policy iterations, indeed with 
a smaller e it did not converge because the intern policy iterations did not gave a precise enough 
approximation. 

In table 10 we present numerical results for the application of FAMGtt with c = 10~^ and 
to problem (26 1 for a 1025 x 1025 points grid. We observe that our algorithm solves 
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the problem in about 49 seconds. Geometrical representation of the strategies of MAX obtained by 
AMGtt on four successive levels in the FAMGtt algorithm, are shown in Figure [8J We can see that 
on coarse grids, the algorithm can find a good approximation of the solution in a few iterations. 
The interpolation of this solution and the corresponding strategies, are used to start AMGtt on 
the next fine level and we observe that only a few numbers of policy iterations are needed on each 
level. 

With this example we show the advantage of using FAMGtt. Indeed, the computation time of 
the FAMGtt algorithm seems to be in the order of the number of discretization points whereas that 
of a AMGtt algorithm is about 160 times greater. This is due to the large number of iterations 
needed by AMGtt for solving this kind of games. Indeed, this number should be compared to the 
diameter of the graph (that is the largest number of edges which must be cover to travel from 
one point to another) associated to the corresponding game problem, for instance the union of all 
graphs of the Markov chains associated to all couple of fixed policies a and /3. Hence due to the 
finite differences discretization, the arcs of the graphs are supported by edges of the grids X^ in 
Z^, so the diameter is 2m with m = 1024. 



6.3 Stopping game with two optimal stopping 

In this example, we consider a stopping game where both players have the possibility to stop the 
game, see [3S| for a complete theory about this subject. In this case, the value of the game starting 
in a; S A" is given by : 



v{x) = sup inf JE^i^''^ / 

Kl '='2 1 [Jq 



rK\f\K2 



r{Ct,Vt) dt + Tpii^i^J 1,^^ 



<K2 



■MU)T^.: 



2<Kl 



eo 
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where ki A K2 = niin(Ki,K2) and we assume niin(Ki,K2) < t (t = inf {i > 0|^t ^ X}, then v is 
solution of equation : 



max < ipi(x) — v{x), rmii{^p2{x) — v{x) , L{v;x) + r{x)} > = for a; in A", (27) 

or equivalently, 

J {L{v;x) +r{x)){w{x) -v{x)) <0 iorxeX, 
\ Vw, ipi < w < tp2 and i^i < v < 4'2 , 

that is 

{{L{v: x) + r{x)) < ii v{x) = ipiix) 

{L{v. x) + r{x)) > if v{x) = ip2{x) 

{L{v:x) +r{x)) = ii ipiix) < v{x) < %l]2{x). 

For the numerical tests, we consider the stochastic differential game whose value v is solution 
of: 

tl)i{x) — vix), rtnn{^2{x) — v{x) ^ Q.^W{x) + r{x)} > =Q for a; in <Y. (28) 

where X = [0,1], for aU x (^ X: i;i{x) ^ -'ip2, V'2(a;) = '4>2 with -02 = (2cos(0.097r) + 7r(0.18 - 
1) sin(0.097r))/2) « 0.6 and r(a;) = O.Stt^ cos(7ra;). For all a; G A", the sets of actions are ^ = {0, 1} 
for MAX and B = {0, 1} for MIN, where action means that the player chooses to stop the game 
and receive ■01 when max stops or -02 when MiN stops, action 1 means that the game is continuing. 



Here, the exact solution of Equation (28 1 in the viscosity sense is 



V'i(a;) for X > (1 - 0.09) 

for X eX { i/'2(a;) for x < 0.09 

cos(7rx) + TT sin(0.097r)a; + c for 0.09 > a: > (1 - 0.09) 

where the constant c = ('02 — cos(0.097r) — 0.097rsin(0.097r)) and is represented in Figure^ For 
all x G X, the optimal strategy for max is a{x) = if a; > (1 — 0.09) and a{x) = 1 else. For all 
X G X, the optimal strategy for min is f3{x) = if a: < (0.09) and /3{x) = 1 else. 



We present numerical results for the discretization of Equation ( 28 1 on a grid with 2049 points 



in Table 11 when using AMGtt with e = 10"^" and in Table h2^ when using FAMGtt with c = 10 
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and e — 10" ^°. As in the previous example, we see the advantage of using FAMGtt for this kind 
of games. Indeed, FAMGtt solves the problem in about one second while AMGtt needs about 
24 minutes. As for the previous example, the computation time of the FAMGtt seems to be in 
the order of the number of discretization points. For this example, due to the finite differences 
discretization, the diameter of the graph is m with m = 2048. We see in Table [TT] that both 
numbers of intern and external policy iterations for AMGtt are of the order of the diameter of the 
graph. 

7 Conclusion and perspective 

In this paper, we have presented our algorithm AMGtt for solving two player zero-sum stochastic 
games. This program combines the policy iteration algorithm with algebraic multigrid methods. 
Our experiences on a Isaacs equation show better results for AMGtt in comparison with policy 
iteration combined to a direct linear solver. We observed that the most part of the computation 
time for the resolution of a non-linear equation ([5]) is used to solved the linear systems involved in 
the policy iteration algorithm. Hence, we noticed that the computation time of AMGtt increase 
linearly with the size of the problem. 

Furthermore, we also presented a full multi-level algorithm, called FAMGtt, for solving two 
player zero-sum stochastic differential games. The numerical results on some stopping differential 
stochastic games presented here show that FAMGtt improves substantially the computation time 
of the policy iteration algorithm for this kind of games. Indeed the computation time of FAMGtt 
seems to be in the order of the number of discretization points whereas that of AMGtt algorithm 
is about 160 to 1700 times greater. This is due to the large number of iterations needed by AMGtt 
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Figure 9: Solution of Equation (28 1 



Table 11: Numerical results for optimal stopping time game (281 with a 2049 x 2049 points grid, 
computed by AMGtt with e = 10"^°. 
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Table 12: Numerical results for optimal stopping time game (28 1 with a 2049 x 2049 points grid, 
computed by FAMGtt with c = 10"^ and e ^ 10~^°. 
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for solving this kind of games. Indeed, this number should be compared to the diameter of the 
graph associated to the corresponding game problem, for instance the union of all graphs of the 
Markov chains associated to fixed policies a and f3. 

The FAMGtt algorithm uses coarse grids discretizations of the partial differential equation and 
so cannot be applied directly to the dynamic programming equation of a two player zero-sum 
stochastic game with finite state space. One may ask if adapting the FAMGtt algorithm to this 
kind of games is possible. Indeed, the complexity of two player zero-sum stochastic games is still 
unsettled, one only knows that it belongs to the complexity class of NPncoNP [49] , and any new 
approach maybe useful to understand this complexity. 
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